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MULTIVARIATE NORMAL APPROXIMATION WITH STEIN'S 
METHOD OF EXCHANGEABLE PAIRS UNDER A GENERAL 
LINEARITY CONDITION 

By Gesine Reinert^ and Adrian Rollin^ 

University of Oxford and National University of Singapore 

In this paper we establish a multivariate exchangeable pairs ap- 
proach within the framework of Stein's method to assess distribu- 
tional distances to potentially singular multivariate normal distribu- 
tions. By extending the statistics into a higher-dimensional space, 
we also propose an embedding method which allows for a normal 
approximation even when the corresponding statistics of interest do 
not lend themselves easily to Stein's exchangeable pairs approach. To 
illustrate the method, we provide the examples of runs on the line as 
well as double-indexed permutation statistics. 

1. Introduction. Stein's method was first published in Stein (1972) to 
assess the distance between univariate random variables and the normal 
distribution. This method has proved particularly powerful in the presence 
of both local dependence and weak global dependence. 

A coupling at the heart of Stein's method for univariate normal approxi- 
mation is the method of exchangeable pairs; see Stein (1986). Assume that 

is a univariate random variable with KW = and KW"^ = 1 , and assume 
that W is a random variable such that (VF, H^') makes an exchangeable 
pair. Assume further that there is a number A > such that the conditional 
expectation of W' — W with respect to W satisfies 

(1.1) E^{W' -W) = -XW. 

Heuristically, (1.1) can be understood as a linear regression condition. If 
(W, W') were bivariate normal with correlation p, then 

E^W' = pW, 
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and (1.1) would be satisfied with A = 1 — p. W was close to normal, then 
so would be W , and it would not be unreasonable to assume that (1.1) is 
close to being satisfied. 

In this spirit, the univariate theorem of Stein (1986) has been extended 
by Rinott and Rotar (1997). With the same basic setup as in Stein (1986), 
they generalize (1.1) by assuming that there is a number A > and a random 
variable R = R{W) such that 

(1.2) E^{W' -W) = -XW + R. 

Note that, unlike condition (1.1), this is not a condition in the strict sense, 
as we can define R : = E^(VF' — W) + XW for any A; however, we always 
have ER = 0. 

One of the results of Rinott and Rotar (1997) is that 
sup|P[W^<x] -P[Z<x]| 

X 

(1.3) 

< -■\/VarE'^(VF' - W)^ + -^yjE\W' - Wf + —VYavR, 

where Z has standard normal distribution. So clearly, representation (1.2) 
is useful only if A~^\/Var i? = o(l). In this case, if Ai and A2 stem from two 
different representations (1.2) for which X~^\/\axRi = o(l) for i = 1,2, then 
it it easy to see that | Ai — A2|/(Ai + A2) = o(l); in this sense, A is asymptoti- 
cally unique. Rinott and Rotar (1997) then apply bound (1.3) to the number 
of ones in the anti-voter model, and to weighted [/-statistics. RoUin (2008) 
provides a proof of a variant of (1.3) which does not use exchangeability but 
only ^{W')=^{W). 

Stein's method has been extended to many other distributions; for an 
overview, see, for example, Reinert (2005). For multivariate normal ap- 
proximations the method was first adapted by Barbour (1990) and Gotze 
(1991), viewing the normal distribution as the stationary distribution of an 
Ornstein-Uhlenbeck diffusion, and using the generator of this diffusion as 
a characterizing operator for the normal distribution. Subsequent authors 
have used this generator approach for multivariate normal approximation 
with different variants, such as the local approach and the size-biasing ap- 
proach by Goldstein and Rinott (1996) and Rinott and Rotar (1996), and 
the zero-biasing approach by Goldstein and Reinert (2005). 

The exchangeable pair approach, in contrast, while having proved useful 
in non-normal contexts [see Chatterjee, Diaconis and Meckes (2005), Chat- 
terjee, Fulman and Rollin (2006) and Rollin (2007)] remained restricted to 
the one-dimensional setting until very recently. A main stumbling block was 
that the extension of condition (1.2) to the multivariate setting is not obvi- 
ous from the viewpoint of Stein's method. 
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In Chatterjee and Meckes (2008), this issue was finally addressed. They 
propose the condition that, for all i = 1, . . . , d, 

(1.4) E^{Wl-Wi) = -XW^, 

for a fixed number A, where now W = {Wi, . . . , Wo) and W = {Wl, W^) 
are identically distributed d-vectors with uncorrelated components (an ex- 
tension to the additional remainder term R was not considered, but would 
be straightforward). They employ such couplings to bound the distance to 
the standard multivariate normal distribution. Using the same argument 
as Rollin (2008), Chatterjee and Meckes (2008) are able to give proofs of 
their theorems without using exchangeability and apply them successfully 
to various multivariate applications. 

Applying a similar heuristic as for (1.1), however, if (VF,!^') were jointly 
normal, with mean vector and covariance matrix 

(1.5) So=(| I), 

then E^W = t.T.-'^W [see, e.g., Mardia, Kent and Bibby (1979), page 63, 
Theorem 3.2.4], in which case 

(1.6) E^{W' -W) = -{ld-tj:~^)W; 

here Id denotes the identity matrix. Again, if (VF, W) is approximately 
jointly normal, then we expect (1.6) to be approximately satisfied. This 
heuristic leads to the condition that 

(1.7) E^{W' -W) = -AW + R 

for an invertible dx d matrix A and a remainder term R = R{W). For R = 0, 
even if E = Id, we would obtain A = Id — S, which in general is not diagonal. 
Hence, we argue that (1.7) is not only more general, but also more natural 
than (1.4). 

Different exchangeable pairs will lead to different A and R in (1.7); our 
embedding method suggests suitable decompositions. Indeed, for a specific 
exchangeable pair (W, W') at hand, it is often far from obvious whether this 
pair will satisfy the linearity condition (1.7) with R of the required small 
order, unless equal to zero. Consider the case of 2-runs. For a sequence of 
i.i.d. Bernoulli distributed random variables ^i, . . . such that P[^i = 1] = 
p, define the centered number of 2-runs 

n 
i=l 

where we let ^n+i ■=S,i- The most natural construction of an exchangeable 
pair in the spirit of Stein (1986) is to pick uniformly a and replace it by 
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an independent copy Denote by V2 the resulting number of 2-runs in the 
new sequence. It is easy to calculate (see Section 4.2) that 

2 2t7 " 

(1.8) ¥y\v^ - V2) = —v2 + -E^^ E - p)- 

n n ^ 

1=1 

The conditional expectation on the right-hand side of (1.8) is hard to cal- 
culate. Furthermore, it has the same order of magnitude as ¥2- Also, the 
weighted [/-statistics approach of Rinott and Rotar (1997) (Proposition 1.2) 
does not yield convergent bounds to the normal distribution. We propose 
the following approach to this problem. Keeping the above coupling, we de- 
fine Vi := J27=i ~ (and accordingly) and consider the problem as a 
2-dimensional problem W := (y^. Equation (1.8) now yields E'^(y2 — V2) = 

~f ^2 + ^^ii and further calculations reveal that E^(y|' — Fi) = — ^Vi, so 
that now (1.7) holds with 

1 0" 

-2p 2 

and R = 0. Using this embedding into a higher-dimensional setting, the prob- 
lem now fits into our framework and allows not only for a normal approxi- 
mation of the primary statistic, but for an approximation of the joint distri- 
bution of the primary and auxiliary statistics. For this embedding method, 
the generality of condition (1.7) is essential; see (4.1) later. 

The rest of the article is organized as follows. In the next section we prove 
an abstract nonsingular multivariate normal approximation theorem (The- 
orem 2.1) for smooth test functions. The explicit bound on the distance to 
the normal distribution is given in terms of the conditional variance, the 
absolute third moments and the variance of the remainder term. Proposi- 
tion 2.8 gives the extension to singular multivariate normal distributions, 
using Stein's method and the triangle inequality. To illustrate our results, 
we calculate the example of sums of i.i.d. variables. 

Section 3 uses the abstract theorem to obtain a similar result for nons- 
mooth test functions, such as indicators of convex sets. Adapting the ap- 
proach by Rinott and Rotar (1996) to general multivariate normal approx- 
imation. Corollary 3.1 displays how the main terms involved in the error 
bounds for smooth test functions reappear in the bounds for nonsmooth 
test functions. 

Section 4 discusses the above mentioned embedding method and illus- 
trates its application with a detailed treatment of runs on the line. We also 
sketch the application to double-indexed permutation statistics. 

The generality of (1.7) comes at the extra cost that now exchangeability 
seems almost inevitable. Indeed, in view of Rollin (2008), we were surprised 
that, in the multivariate setting, the exchangeability condition cannot be 
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removed as easily as in the one-dimensional case. Therefore, the last section 
discusses the exchangeability condition, condition (1.7) and their implica- 
tions. 

Appendix A contains the proof of Corollary 3.1, and details of the runs 
example are in Appendix B. 

1.1. Notation. Random vectors in are written in the form W = {Wi, 
W2, . . . , WdY, where Wi are R-valued random variables for i = 1, . . . ,d. If S 
is a symmetric, nonnegative definite matrix, we denote by S^/^ the unique 
symmetric, nonnegative definite square root of E. Denote by Id the identity 
matrix, usually of dimension d. Throughout this article, Z will denote a 
random vector having standard multivariate normal distribution, also of 
dimension d. 

For ease of presentation, we abbreviate the transpose of the inverse of a 
matrix in the form A^* := (A^^)*. 

Stein's method makes good use of Taylor expansions. For derivatives of 
smooth functions h : — > M, we use the notation V for the gradient opera- 
tor. For the sake of presentation, the partial derivatives are abbreviated as 
hi = -^h, hij = Q^Q^ h unless we would like to emphasise the dependence 
on the variables. 

To derive uniform bounds, we shall employ the supremum norm, denoted 
by II • II for both functions and matrices. For a function — > M, we ab- 
breviate |/i|i :=supj ||^/i||, \h\2 :=supjj II Qx dx ■ ^H' if the corre- 

sponding derivatives exist. 

2. The distance to multivariate normal distribution in terms of smooth 
test functions. First we derive a bound on the distance between a multi- 
variate target distribution and a multivariate normal distribution with the 
same mean vector (which is assumed to be in the sequel), and with the 
same, positive definite covariance matrix. We start by considering smooth 
test functions. 

Theorem 2.1. Assume that {W,W') is an exchangeable pair of Re- 
valued random vectors such that 

(2.1) EW = o, eww^ = j:, 

with S G W^^'^ symmetric and positive definite. Suppose further that (1.7) is 
satisfied for an invertible matrix A and a a (W) -measurable random vector R. 
Then, if Z has d-dimensional standard normal distribution, we have for 
every three times differentiable function h, 

(2.2) \Eh{W) - m{T}'^Z)\ < + ^-^B + l^\h\, + id||S||i/2|/,|2^ c, 
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where, with :=Em=i \{^~^)mA, 



A=Y, ^Yax^w {Wl -Wi){W'j -Wj), 
d 

B= \^'^n{W[-Wi){W'^-W^){Wl-Wk) 

i,j,k=l 



i=l 



Before we proceed with the proof, we illustrate Theorem 2.1 by means of 
the simple example of sums of i.i.d. random variables and make some further 
remarks. 

Corollary 2.2. Suppose that W = {Wi, . . . , Wd) is such that, for each i, 
Wi = X)j=i where Xij,i = 1, . . . ,d,j = 1, . . . ,n, are i.i.d. with mean zero 
and variance -, so that the covariance matrix S = Id. Assume further that 
there exist < /?, 7 < oo such that 

Then, for every three times differentiahle function h, 

\Eh{w) - m{z)\ <±i^^\h\2 + ^i^is) . 

Proof. We construct an exchangeable pair by choosing a vector / and 
a summand J uniformly, such that P(I = i,J = j) = l/{dn). If I = i, J = j, 
we replace Xij by an independent copy X'^ j; all other variables remain 
unchanged. Put 

Wi = Wi- Xi,j + X'j^j 

and Wl. = Wk for 1; denote by W' the resulting d-vector. Then (W, W') 
is exchangeable, and, in (1.7), A = ^Id with R = and, hence, C = 0. For 
our bounds we note that A*-*^ = dn. We calculate that 



3 

Thus, 

VarE^(iy/ - W.f < -1_ ^ VarX^. < 



J 
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Moreover, by the construction, for i k, (VF/ — Wi){Wl — Wk) = 0, and 
{W[ - Wi){W'^^ - Wk){Wl - Wi) = 0, unless i = k = l. By assumption. 



The result now follows directly from Theorem 2.1. □ 

Remark 2.3. Multivariate normal approximations for vectors of sums 
of i.i.d. random variables have been so intensively studied that there is not 
enough space to review all the results. The approach most similar to ours 
is found in Chatterjee and Meckes (2008), where instead of exchanging only 
one summand, a whole vector would be exchanged. Their results yield 

mw) - <^h\ < ^—^^^\h\, + 4^\h\2. 

Due to the different Stein equation used, the dependence on the dimen- 
sion differs, and the bounds are in terms of different derivatives of the test 
function. The overall similarity in this special case is apparent. 

Remark 2.4. If we were to normalize the random vectors in Theo- 
rem 2.1, denoting the normalization of Vl^ by 14^ := Yr^l'^W and W' = 
T,~^/'^W' , then, the conditions of the theorem remain satisfied for 
with S = Id and A = S-i/^AS^s as well as ^ = T,-'^/^R. 



Remark 2.5. As a precursor to (1.7), in the context of multivariate 
zero-biasing, Goldstein and Reinert (2005) use the condition of the form 
(1.7) for A such that Ajj = p + l{i = j). 

After these remarks we proceed to the proof of Theorem 2.1, which is 
based on the Stein characterization of the normal distribution that y E M'^ 
is a multivariate normal MVN(0, S) if and only if 

(2.3) E{V*SV/(y) - y*V/(y)} = O for ah smooth/ : M"^ ^ M. 

We will need the following lemma to prove the theorem; however, see also 
Remark 2.4, Barbour (1990), Goldstein and Rinott (1996) and Gotze (1991). 
The proof of Lemma 2.6 is routine. 

Lemma 2.6. Let h:W^ he differentiable with bounded first deriva- 
tive. Then, ifT,^ M.'^^'^ is symmetric and positive definite, there is a solution 
/ : M"^ — > M to the equation 

(2.4) V*SV/(w) - w^Vf{w) = h{w) - ¥.h{Y}'^Z), 
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which holds for every w £M . If, in addition, h is n times differentiable, 
there is a solution f which is also n times differentiable and we have for 
every k = 1, . . . ,n the bound 

d^h{w) 
for every w . 

Remark 2.7. Compared to the main theorem of Chatterjee and Meckes 
(2008), which only needs the existence of two derivatives, our Theorem 2.1 is 
more restrictive in the choice of test functions h. This reflects the fact that we 
make use of Lemma 2.6, which is motivated by Goldstein and Rinott (1996), 
whereas Chatterjee and Meckes (2008) prove new bounds on the solutions of 
(2.4), but only for S = Id; see also Raic (2004) for similar results. The general 
result of Lemma 2.6, however, allows us to work with the unstandardized 
pair (VF,VF'), which not only usually simplifies the calculations, but also 
yields more informative bounds if the limiting covariance matrix is singular. 

Proof of Theorem 2.1. Our aim is to bound \Eh{W) - ^h{T}/'^Z)\ 
by bounding |E{V*SV/(VF) - W*Vf{W)]\, where / is the solution to the 
Stein equation (2.4). First we expand "KW^V f{W). Define the real-valued, 
anti-symmetric function 

(2.6) F{w',w) ■.= \{w' -wfK-\Vf{w') + Vf{w)) 

for w,w' G W^, and note that, because of exchangeability, ¥,F{W' ,W) = 0; 
see Stein (1986). Thus, 

= \¥.{{W' - WfK-\Vf{W') + V/(VF))} 
= ¥.{{W' - VF)*A~*V/(VF)} 

(2.7) + \^{{W' - WfK-\Vf{W') - Vf{W))} 
= E{i?*A-*V/(VF)} - E{VF*V/(VF)} 

+ \^{{W' - WfK^\Vf{W') - V/(VF))}, 

where we used (1.7) for the last step. Recalling the notation fij{x) = gj^g^ . f{x), 
Taylor expansion gives 

(u;'-u;)*A-*(V/(^«')-V/(t/;)) 

= i^^^)m,i{'^i - Wi){w'j - Wj)fm,j{w) 

m,i,j 

+ (^~^)m,i{Wi - Wi){w'j - Wj){w'k - Wk)Rmjk, 

m,i,j,k 



(2.5) 



1 

< - 
- k 
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(2.8) 



1 



dwm dwj dwk 



Thus, in (2.7), 

(2.9) = [^-'UmWl - W,){W'^ - W,)f^,,{W)} 



+ {^-^UAmwl-Wi){W'^-W,){Wl,-Wk)R^,k]- 

m,i,j,k 

Now we turn our attention to EV*SV/(VF). Note that, because of (2.1), 
(1.7) and exchangeabihty. 



(2.10) 



Hence, 



E{W' -W)(W' -wY 

= E{W{W - W'Y} + E{W^(VF - w'Y} 

= 2E{W{AW - RY} = 2SA* - 2E{WR^) =: T. 



V*SV/('Uj) = -V*rA~*V/(u;) + V*E(VFi?*)A"*V/(u;) 



dwm dwj 



Combining this equation with (2.7) and (2.9), 
|E{V*SV/(Ty) - W^Vf{W)}\ 



1 

< - 
- 2 



(2.11) 



+ 



+ 



1 



E E{(A-i)^,i(M^/ - Wi)iW; - W^){Wl - Wk)Rmjk} 

m,i,j,k 



+ 



E iA~^)mMWjR^)E 



m.i.j 



dWm dWj 



< ^ A«E|T,- i - E^iWl - W,){W; - Wj)\ + ^5 



+ i/iii E A«E|ii,i + ^ E >^^'^nwjRi\, 



10 



G. REINERT AND A. ROLLIN 



where we used (2.8) to obtain the second inequahty, and Lemma 2.6 to obtain 
the last inequahty. From the Cauchy-Schwarz inequahty, E|i?j| < y^Ei?J and 

The C-expression in (2.2) now follows from the last two terms of (2.11). 
Recalhng that E{W' - W){W' - Wf = T, this proves the first term of (2.2) 
from the first term of (2.11). □ 

Sometimes we may wish to assess the distance to a normal distribution 
for which the covariance matrix Sq, while nonnegative definite, does not 
have full rank. Stein's method helps to derive a straightforward bound in 
this case also. The proof of the following proposition is straightforward and 
routine, noting that (2.3) remains valid if the covariance matrix is not of full 
rank. 

Proposition 2.8. Let X and Y be R'^ -valued normal vectors with dis- 
tributions X ~ MVN(0,S) and Y ~ MVN(0,So), where S = {aij)ij=i^,„^d 
has full rank, and Eg = {(^i j)ij=i,...,d is nonnegative definite. Let h:W^ ^M. 
have 2 bounded derivatives. Then 

1 

\Eh{X)-Eh{Y)\<-\h\2 \<y^,-al^\. 

Using the triangle inequality and Theorem 2.1, we thus obtain a bound 
for a normal approximation even for a normal distribution with degenerate 
covariance matrix. 

3. Nonsmooth test functions. Following Rinott and Rotar (1996), let $ 
denote the standard normal distribution in M*^, and (/> the corresponding 
density function. For /i : M"^ — > R set 

/i|(x) = sup{/i(x + y) : \y\ < 5}, 
hj (x) = mi{h{x + y):\y\<6}, 
h{x,6) =hj{x) - hj{x). 

Let 7i he a class of measurable functions M'^ ^ M which are uniformly 
bounded by 1. Suppose that, for any h^Ti: 

(CI) for any > 0, h^{x) and hj{x) are in 7i, 

(C2) for any d x d matrix A and any vector b G M.'^, h{Ax + b) ^Ti, 
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(C3) for some constant a = a{7i, 5) 

(3.1) supj / h{x,6)<!>{dx)\ <a5. 

hen Uk<^ J 

Obviously we may assume a > 1. 

The class of indicators of measurable convex sets is such a class; for this 
class, a < 2\/d; see Bolthausen and Gotze (1993). 

In the same way as in Rinott and Rotar (1996), we can show the following 
corollary. The presentation differs from Rinott and Rotar (1996), as we make 
the relationship to the bounds in Theorem 2.1 immediate and in that we 
allow for general S. The now fairly standard proof is found in Appendix A. 
We also note forthcoming work by Bhattacharya and Holmes (2007). 

Let W have mean vector and variance-covariance matrix S. If A and R 
are such that (1.7) is satisfied for W, then Y = Y,~^^'^W satisfies (1.7) with 
A = S-V2AS1/2 and R' = S^Vs^. We put 

AW=f^|(S-V2A-isi/2)„^_^|, 

m=l 

as well as 



A' = ^ aW /VarE>'5:srV2s^T]/2(|^^ _ w,){W^ - W,), 

i,j V '^'^ 

i,j,k r,s,t 

and 



i=l 



\ 



k 



Corollary 3.1. Let W be as in Theorem 2.1. Then, for all he H with 
\h\ < 1, there exists 7 = 'y{d) such that, with a> 1 as in (3.1), 

sup \Eh{W) - Eh{Z)\ < -f^ i^- D' log{T') + + C' + aVT' 



with 



2 



T' = ^(^D' + ^^ + D'^j and D' = ^ + C'd. 

If A' , B' and C are 0{n-^/^), then we would obtain a bound of order 
0(?i-i/^). This is poorer than the n ^/^logn type of bounds obtained in 
Rinott and Rotar (1996), but Rinott and Rotar (1996) obtain the improved 
rate by assuming that the random vectors are bounded. 
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4. The embedding method and apphcations. 

4.1. General framework. Assume that an ^-dimensional random vector 
of interest is given. Often, the construction of an exchangeable pair 
(W(^i^,W'^g^) is straightforward. If, say, W(^£-^ = VK(£)(X) is a function of i.i.d. 
random variables X= (Xi, . . . ,Xn), one can choose uniformly an index / 
from 1 to n, replace Xj by an independent copy Xj, and define W^j^^ := 
W(^£^(X'), where X' is now the vector X but with Xj replaced by Xj. 

In general there is no hope that {W(^£),W^^-^) will satisfy condition (1.2) 
with R being of the required smaller order or even equal to zero, so that in 
this case Theorem 2.1 would not yield useful bounds. 

Surprisingly often it is possible, though, to extend W(^£^ to a vector G M'^ 
such that we can construct an exchangeable pair H^') which satisfies 
condition (1.2) with R = 0. If we can bound the distance of the distribution 
C{W) to a d-dimensional multivariate normal distribution, then a bound 
on the distance of the distribution £(VF(^)) to an ^-dimensional multivariate 
normal distribution follows immediately. 

To explain the approach, we turn the problem on its head. Suppose that 
VF G M'^ is such that we can construct an exchangeable pair (VF, W) which 
satisfies condition (1.2) with i? = 0. Rename the first £ components to com- 
prise W(^i) , so that 



W: 



and W(^£) = lifiW, with 



Ie,o — (ld^,0^x{d-^))i 

0£x{d-i) denoting the £ x (d — ^)-matrix consisting entirely of O's. Defin- 
ing W^^^-j = lifiW, it follows that (VF(^), VF^'^^ forms an exchangeable pair. 
From (1.2), 

E^(VF(£) - VF(^)) = hfiE^iW - W') = -lifl^W. 
Now decompose the matrix A as 

^^rAi,i Ai,2' 

1^2,1 ^2,2. 

where Ai^i denotes an £x £ submatrix, Ai^2 denotes an £x {d — £) submatrix, 
and so on. Then 

/,,oAVF = Ai,iVF(,) + Ai,2iF('^-^) 

and, hence. 



MULTIVARIATE NORMAL APPROXIMATION WITH STEIN'S METHOD 13 



Conditioning on W(^£) gives that 

Thus, condition (1.2) is satisfied with 

(4.1) i? = -Ai,2E^wVF('^-^). 

If Ai^2 = 0, then no embedding is required. But if Ai^2 7^ 0, then the remain- 
der R in (1.2) is a nontrivial hnear combination of random variables, and 
these random variables could serve as embedding vector. In order to obtain 
useful bounds in Theorem 2.1, the embedding dimension d should not be 
too large. In the examples below it will be obvious how to choose W^^~^^ to 
make the construction work. 



4.2. Runs on the line. Let X = (^i, . . . , ^„,) be a sequence of independent 
random variables with distribution Bernoulli(p), <p <1, that is, P[^i = 
1] = 1 — P[^i = 0]= p. For d > 1, define the (centered) number of d-runs as 

n 

Vd '■= X! ' ' ' -P ), 

m=l 

where, for convenience, we assume the torus convention that ^^+1 = Cii 
= ^2 and so on. 

As mentioned in the Introduction, if we want to use the obvious con- 
struction of an exchangeable pair, the univariate version of exchangeable 
pairs of Rinott and Rotar (1997) (Proposition 1.2) does not yield conver- 
gent bounds of Vd to the standard normal distribution if d> 1. However, 
we can tackle the example with our approach by incorporating the auxiliary 
variables Vi, . . . ,Vd-i, such that the problem becomes linear in a higher- 
dimensional setting. 

We construct an exchangeable pair (X,X'), where instead of just one, 
we resample d — 1 of the ^j. To this end, let / be uniformly distributed 
over {1, . . . , n} and let - ■ ■ ,Cn be independent copies of the ^j. Let X' be 
the same as X but with the subsequence ^7,^7+1, . . . ,(,i^d-2 of length d — 1 
replaced by C'l^C'i+iJ ■ ■ ■ ^^'i+d~2- Clearly (X,X') forms an exchangeable pair. 
Define ■.= Vi{T); we have 

/-I I+d-i-l 

Cm • ■ • ^/-iC/ • • • Cm+i-1 + ^ im'''im+i-l 
m=I—i+l m=I 

I+d-2 I+d-2 

+ X] Cm • ■ ■ C/-lC/ • • • Cm.+i-l - X! Cm • • • Cm+i-1 ; 
m=I+d—i m=/— i+1 



Vl-Vi = 
(4.2) 
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where sums J2a defined to be zero if a > 6. Now, (4.2) yields 

E(^iv..,Vd-i)(v^.'_y.) 
(4.3) = -n-i[(d + i - 2)V, - 2pVi^i - 2pV,_2 2p'-^Vi] 



-n 



i-l 



id + i-2)Vi-2j2p'~''Vk 

k=l 



From this representation we see that we may take Vi, . . . , V^_i as the auxil- 
iary random variables. 

Straightforward calculations yield that, for all i >j, 



E{ViVj) = n 



(4.4) 



+ l)p' + 2Y,p'+^-' -{i + j- l)p 



i+j 



1=1 



np'{l - p)^{i - j + I + 2k)p^ . 

k=0 



In particular, 
(4.5) 



i-l 



EVi^ = np*(l - p) ^(1 + 2fc)/ 

k=0 



which lies in the interval between np*(l — p) and np'(l —p)i'^. Thus, we 
define the Wi to be the weighted versions 



(4.6) 



Vi 



yjnp'{l-p) 
and from (4.4) we have for general i and j 



iAj-l 



(4.7) 



EiWiWj) =pl^-^l/2 ^ (|i - il + 1 + 2k)p'' =: aij. 



k=0 



From (4.7) it is clear that the corresponding E = {crij)ij is constant for all 
n and of full rank. For p^ 0, T, converges to uncorrelated coordinates and 
for p^ 1 to a matrix of rank 1 . For applications and further references see 
Glaz, Naus and Wallenstein (2001) and Balakrishnan and Koutras (2002). 
Now, from (4.3) we have 



E^{Wl-Wi] 



-n 



i-l 



{d + i-2)W^-2J2p^'~''^^^Wk 

k=l 
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Thus, (1.7) is satisfied with R = and 
d-1 



A = l 

n 



.2p(fc-i)/2 
.2p('i-l)/2 



-2pi/2 d + k-2 



-2pV2 2{d-l)_ 



Theorem 4.1. With W defined as in (4-6), n > 2d — 1 and T, given 
through (4-7), we have for three times dijjerentiable functions h that 

Proof. Some rough estimates yield that, for all I < i, j,k < d, 



A« < i^, 
d 



VarE^(W^/ - Wi)iW- - Wj) < 



768^5 



jlSpd^l — p)2 ' 

6Ad^ 

Now apply Theorem 2.1. Details can be found in Appendix B. □ 



Remark 4.2. Although the bound is quite crude with respect to the 
dimension and hence mainly of theoretical interest, it is explicit. For small 
values of p or large values of d, however, Poisson approximation is more 
appropriate, and in these cases the bounds for normal approximation cannot 
be expected to be good unless n is very large. We also note that Vd exhibits 
a local dependence structure and thus also Stein's method using the local 
approach, such as in Rinott and Rotar (1996), could easily be used; and, of 
course, there is an abundance of results about m-dependent sequences. 

Remark 4.3. In the case of 2-runs, using the notation of (1.8) and 
the consequent paragraph, it is not difficult to see that, for any choice 
of A and defining R = R{V2,Vi) := a-^{XV2 - f ^2 + ^Vi), we have that 
A^^VVar R is at least of order 1 as n ^ oo, where := VarV2. It may 
nevertheless be possible to choose A such that, with R = R{V2) := EX'^R = 

a-'^{XV2 - ^V2 + ^E^^Vi), we have xVVaiR = o{l) , so that a representa- 
tion (1.2) could indeed be found with R being of the required small order. 
But, whereas E^^y^ ig hard to calculate, in this situation the application of 
the multivariate version (1.7) and Theorem 2.1 is straightforward. 
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4.3. Double-indexed permutation statistics. Let aij^k,h ^ ^ i,j,k,l < n, 
be real numbers such that aij^i^^i = whenever i = j but j. Assume that 

(4.8) J2 HjAi = 

i,j,k,l 

and define 

n 

Vo = Vo(vr) = ^ as^t,Tr{s),Tv{t), 

s,t=l 

where vr is a uniformly drawn random permutation of size n. A Berry- 
Esseen bound for the distribution of Vq was proved by Zhao et al. (1997) 
under quite general conditions, generalizing the proof of Bolthausen (1984), 
which is related to the exchangeable pair coupling. Under similar conditions 
as Zhao et al. (1997), Barbour and Chen (2005) used the exchangeable pair 
coupling to find a nontrivial representation of Vq of the form (1.2) with a 
nonzero remainder term R; see their article also for a historical overview. Yet 
the problem is so rich that there is to date no result which unifies all the cases 
in which asymptotic normality holds. For example, the results in Barbour 
and Chen (2005) and in Zhao et al. (1997) do not cover the the number 
of descents in a random permutation, for which asymptotic normality was 
derived in Fulman (2004) via exchangeable pairs. 

We will discuss here only the applicability of this example to Theorem 2.1 
to illustrate the embedding method, which contrasts with Barbour and Chen 
(2005) in the sense that, with our approach, again one does not need to 
find a one-dimensional representation of the form (1.2) but can use directly 
the multidimensional version (1.7) in a straightforward manner. We also do 
not bound the error terms because the corresponding calculations are too 
involved for the purpose of this paper. 

Construct now an exchangeable pair as follows. Let I and J be distributed 
uniformly over 1, . . . , n conditioned that I ^ J. Define the permutation vr' = 
(7r(/)7r( J)) o TT so that vr' is the permutation where TT'{k) = Tr{k) for all k ^ 
I, J, and where = 7r( J) and vr'(J) =tt{I). Let now, for the sake of a 
simpler notation, ajj ^ ^ := aijXfc),7r(/)- Defining W = W{7r'), we have 

n 

^0 - ^0 = - ^{"'1,3,1,3 + a%,j,3 + 0'Xi,s,i + ^l,J,3,j) 

3 = 1 

n 

+ X! ("■?,«, "'J,3,I,S + ^^3,1,3, J + '^S,J,S,7) 

3 = 1 

~ ("■1,I,J,J + ^1,J,J,I + ^J,I,I,J + "'J,j,l,l)- 
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Hence, 



1 " 

1 " 

~l 7 1 \ ^ y ^ ^j.s,i,s ~^ ^s,i,s,j ~^ ^s,j,s,i) 



4 2 " 

= ^0 + TT T.T.Ks,j,s + < 

2 2 

/ 2n — 1 \ 
= \i ^1^0 + 14 + ^2] +R1 + R2, 

with A:=2/(n — 1) and where 

n A " A 

j=i *j=i *)i=i 



^« •= X! "i!!r(s) ^^"^ « = 1) 2, where 



s=l 

(1) _ 1 (2) _ 1 

Thus, the conditional expectation £^^(^0 — Vq) can be decomposed into a 
sum of the original statistic Vq and two related single-indexed permutation 
statistics, together with an error term. Now, for i = 1,2, 

~ "'I,n{I) "j,7r(J) ^ "/,7r(J) ^ " J,7r(7) 

and, thus. 
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= -AV-, 

where the last equality follows from (4.8). Thus, (1.7) holds for the vector 
W = {yQ,Vx,Y2)^ with 

-1 -l\ 

K-\ ^ 

A- A Q ^ Q 

V 1/ 

and i? = (i?i + i?2,0,0)*. 

In the special case where Uijki = hjCki with bu = Qj = for all k, I and 
where (bij) or (cki) is symmetric up to a (possibly negative) constant, we 
have Ri = and i?2 = (3Xn~^Vo for some number /3, so that (1.7) holds with 
an i? = and a slightly different A, which would simplify the estimates. 
Note that these assumptions hold, for example, if either (bij) or (cjj) is the 
adjacency matrix of an undirected graph containing no self-loops. 

Mann-Whitney-Wilcoxon statistic. Let xi, . . . ,Xn^ and yi, ■ ■ ■ ,yny, nx + 
Uy = n, be independent random samples from unknown distributions Fx 
and Fy, respectively. The MWW-statistic is then defined to be the num- 
ber of pairs {xi,yj) such that Xi < yj. Let 7r(z) be the rank of Zj, where 
z = (xi , . . . , Xn^ ,yi, . . . , yuy ) is the combined sample. To test the hypothesis 
Ho :Fx = Fy, we may assume that vr has uniform distribution. It is easy to 
see that, defining 

{+^, if 1 < i < Tlx, nx + I <: j <: n and 1 < k < I < n, 
— ^, if 1 < i < fix, Tlx + I < j < n and 1 < I < k <n, 
0, else, 

Vq is equivalent to the MWW-statistic (up to a shift). It is well known that 
VarFo 

— nxny{n -\- 1)/12 [see Mann and Whitney (1947)], so that if, for some 
< a < 1, Tia; X an and x (1 — a)n, respectively, we have Var Vq x . 
Note now that, as ai^i^k,i = for all i-,k,l and as J2i,j o,i,j,w{j),TT(i) = 

C'i,j,n{i),n{j)i wc have i?i = and R2 = — ^Vq. Hence, the remainder 
term C in Theorem 2.1 has the required lower order. 

Further, we calculate that a\^j = ""^"^n"''^^^ if 1 < "i < and a^/'j = 
otherwise, and therefore, using the variance formula for the usual singly 
indexed permutation statistics [see Hoeffding (1951)], 

1 " 

Var = -1^ ^ («S - «v - «5 + ^ 

The same asymptotic is true for V2, so that indeed = n~^/^(Vo, Vi, V2) 
with the above coupling and choice of A is a good candidate for Theorem 2.1. 
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5. Some comments on the exchangeability condition. Exchangeability 
is used twice in the proof of Theorem 2.1, namely, in (2.7) and (2.10). In 
this section we discuss the necessity of this condition if one uses the Stein 
operator of the form in equation (2.4). 

5.1. Exchangeability and anti- symmetric functions. In (2.7), we use ex- 
changeability in the spirit of Stein (1986). It has been proved by Rollin 
(2008) that in the one-dimensional setting the exchangeability condition can 
be omitted for normal approximation by replacing the usual anti-symmetric 
function (2.6) with F{w, w') = g{w') — g{w), where now only equality in dis- 
tribution is needed to obtain an identity similar to (2.7). Chatterjee and 
Meckes (2008) also proved their results with this new function F but under 
the stronger condition (1.4). However, there seems to be no obvious way to 
apply the above approach under the more general assumption (1.7) (even 
with R = 0) to remove the exchangeability condition. To see this, note that, 
by multivariate Taylor expansion, 

g(w') = g{w) + {w' - wY\7g{w) + ^\7\w' - w){w' - wfVg{w) 

(5.1) 

+ r{w' ,w), 

where r is the corresponding remainder term of the expansion. Thus, (5.1) 
and (1.7) yield the identity 



Q = ¥.g{W')-¥.g{W) 
(5.2) = -E{VF*A*Vg(VF)} + \¥.{V\W' - W){W' - WfVg{W)} 
+ ¥.r{W\W), 



for any suitable function g. To optimally match (5.2) and the left-hand side 
of (2.4), we have to choose g such that the system 



is satisfied. In the one-dimensional setting of Rollin (2008) and the multivari- 
ate setting A = Aid of Chatterjee and Meckes (2008), (5.3) can be solved by 
setting g = X~^f. Indeed, (5.3) cannot be solved in general; only if A = Aid 
does (5.3) have a twice continuously partially differentiable solution g for a 
sufficiently large class of functions /. 

5.2. Exchangeability, the covariance matrix and the A matrix. In (2.10), 
using only equality in distribution instead of exchangeability, we obtain 



It is clear from (2.11) that the canonical choice for the variance structure of 
the approximating multivariate normal distribution would then be 



(5.3) 



A'V<7 = V/ 



(5.4) 



E{W' - W){W' - Wy = AS + SA*. 



(5.5) 



^E{W' -W){W' -W) 



*A-* = i(ASA-* + S)=:S 
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which in the exchangeable setting reduces to S; see (2.10). 

It is easy to see that S = S if and only if A := 5]~^/^AS^/^, arising from 
standardization (see Remark 2.4), is symmetric. If {W , W) is exchangeable, 
we have from (2.10) that S = E and, hence, A is symmetric. While exchange- 
ability of {W, W) is not a necessary condition for A to be symmetric, the 
following example illustrates that nonsymmetric A is far from unusual. 

Example 5.1. Let d be a positive integer, d>A. Let X{k) = {Xi{k);i = 
1, . . . k G be a discrete time Markov chain with values in { — 1, l}'^ 
and with the following transition rule. At every time step k, pick uniformly 
an index I from {1,2, . . . ,d}. Then with probability 1/2, let Xj{k + 1) = 
—Xj-i{k), and with probability 1/2, let Xi{k + 1) = Xi^i{k), where we 
interpret the indices and d + 1 as d and 1, respectively. For all j ^ I, put 
Xj{k + 1) = Xj{k). Observe that, for arbitrary k and i ^ j, 

E[Xi{k + l)Xj{k + l)\X{k)] 

= ^iX^+l{k) - X,.r{k))X,{k) + ^X,{k){X,+^ik) - Xj.,{k)) 

+ ^X,{k)Xj{k). 

Now, ifE{Xi{k)Xj{k)} = for ah i / j, then also E{Xi{k + l)Xj{k + l)} = 
(where the case j £ {i — l,i + 1} is slightly different than for the other j). 
Thus, if we start the chain such that the Xi are uncorrelated and centered, 
then, by induction, the Xi are uncorrelated for every k and it is easy to see 
from this that also the equilibrium distribution of the chain has uncorre- 
lated Xi. 

Assume that X'^^^X^^^... is a sequence of mean zero independent and 
identically distributed d-vectors with finite S ■.= 'E{X^^\X^^^Y} . It is clear 
from the multivariate CLT [see, for example, Rotar (1997), page 363, The- 
orem 4] that W = n~^l'^ Sr=i converges to the multivariate mean zero 
normal distribution with covariance matrix S. 

However, consider the following coupling construction. Let X*-*^ have the 
equilibrium distribution of the above Markov chain and for each i let X'^*) 
be the value after one step ahead in the Markov chain, such that the pairs 
(X(*),X'«) are independent for different i. Define now W = VF + n-^/2(j^/(/) _ 
X^^^), where / is uniformly distributed on {1, . . . , n}, and note that ^{W') = 
^{W). We calculate that E^'''(X'» - X») = -AX» with 

if j = i, 
iij = i-l, 

ifj = i + l, 
else. 
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Then E^(H^' -W) = -n~^AW. As A is not symmetric, {W, W) cannot be 
exchangeable, and so Theorem 2.1 cannot be apphed with this couphng. 



APPENDIX A: PROOF OF COROLLARY 3.1 
For h£7i define the following smoothing: 



hs{x)= h{s^l^y + {l-sY/^x)^dy), 



0<s<l. 



The following key result for this smoothing can be found in Gotze (1991). 

Lemma A. 1 . Let Q he a probability measure on M'^, and letW ^ Q,Z 
^. Let a he as in (3.1). Then there exists a constant 7 > which depends 
only on the dimension d such that, for < t < 1, 

sup \&h{W) - E/i(Z)| < 7[sup |E(/i - ^h)t{W)\ + aVt . 

To prove Corollary 3.1, first we assume that S = Id. Let <t < 1. The 
solution of (2.4) for ht is ^t(x) = ^ ds, and for \h\ < 1, it is shown 

in Gotze (1991) and also in Loh (2008) that there is a constant 7 = "y{d) 
depending only on the dimension d such that 

(A.l) |^t|i<7, |^t|2<7log(t~^); 

the 7 is in general not equal to the 7 in Lemma A.l. Then, as in (2.11), 

\Eht{W) - Eht{Z)\ = |E{V*V^t(VF) - VF*V^'j(iy)}| 

(A.2) < i ^ |(A-i)„,iE(iy/ - Wi){W; - Wj){W[ - Wk)Rmjk\ 

m,i,j 

+ llog{t-')A + jC{l + dlogit-')), 

with A,B and C as in Theorem 2.1. For the last step we used the same 
estimates as applied for the remainder term in (2.11), and that S = Id. 

For the remainder term Rmjk^ in Loh (2008), Lemma 1 (page 1992), it 
is shown that, if < 1, then there is a constant cq (depending only on d) 
such that, for any finite signed measures Q on MJ^, 



sup 

l<p,q,r<ci 



Q3 



dZp dZq dZr 



. Co 

< — p sup 

Vt 0<s<l 



%iz)Qidz) 



h{sv + y)Q{dv) 
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Thus, we can bound the second term in (A. 2) by -^B. For simphcity, we 
relabel 7 as the maximum of 7, 7^ and 7C0, yielding that 

sup \Eh{W) - Eh{Z)\ < 7^ f Dlog(t~^) + -Br^/^ + C + aVi 
hen \ 2 

with D = ^ + Cd. The minimum with respect to t is attained for T = ^ (I? + 

\J ^ + D^)"^, which gives the assertion for S = Id. 
To complete the proof for general S, we standardize 

Y = YT^^'^W. 

From condition (C2), we have that for any d x d matrix A and any vector 
b G M.^, h{Ax + b) gH, so, in particular, /i(E~^/^x) G H. Hence, the above 
bounds (A.l) can be applied directly. The proof now continues as for the 
S = Id case, but with the standardized variables. We omit the details. 

APPENDIX B: DETAILS OF THE RUNS EXAMPLE 

We first show the following lemma, which may be useful when the nondi- 
agonal entries of A are small compared to the diagonal-entries. 

Lemma B.l. Assume that A is lower triangular and assume that there 
is a> such that |Ajj| < a for all j < i. Then, with 7 := infj |Ajj|, 

sup A*- ' < . 

i 7 

Proof. Write A = AeAd, where Ad is diagonal with the same di- 
agonal as A and A^; is lower triangular with diagonal entries equal to 
1 and (A^)jj := Aij/Ajj. Denote by || • ||p the usual p-norm for matri- 
ces and recall that, for any matrix A, \\A\\i = supjJ2i\-^i,j\- Then, A^*) < 
||A~-^||i < ||A^"^||i||A^"^||i. Noting that |(A£;)jj| <a/7 for all j <i, we have 
from Lemeire (1975) that ||Ag^||i < (a/7 + l)'^"^ Now, as ||A;^^||i = 7"^ 
the claim follows. □ 



Fix now i and j. From (4.2) it is not difficult to see that we can find two 
sequences Ai, . . . , A]\f^ ^ and Bi, . . . , B^^ ^ of subsets of {—d + 1, . . . ,2d — 3} 
such that 



1 iiif 

iE^'^>/ - v.){v^ - = ^ E E n n c+z 

(B.l) 



TL 



1 

n , 
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From (4.2) is easy to see that Nij < 4:{d + i - 2){d + j - 2) < 16(f , as V- - Vi 
(respectively V- — Vj) contain no more than 2{d + i — 2) [respectively 2{d + 
j — 2)] summands. Note that \Ak\ + \Bk\ >i\/j, that is, every summand in 
(B.l) is the product of at least iV j independent random indicators. Hence, 
it is not difficult to see that 

(B.2) Var(z.''^(m)) < 256dV^'- 

Now, 

VarE^(Ty/ - Wi){W^ - Wj) 

< o u- /-, ^ Var E«'«'(y/ - Vi)iV' - V,) 

1 

Cov(zy*'^(?Ti),i/^'^(m')). 



„4^i+j(l _p)2 



m,m'=l 



If \m — m'\ > 3d, we have Cov(z>*'-'(m), j>*'-'(m')) = because v^'\m) and 
u^'^m') are independent. If \m — m'\ < 3d, we can apply (B.2) to estimate 
the covariances and, hence, we obtain 



VarE^(VF/ - Wi){W' - Wj) < 



Similar arguments lead to the estimate 

- Vi){V^ - Vj){Vl - Vk)\ < 64d^p"^^-ii'^M^ 
hence, for the second summand in (2.2), 



E\{Wl - Wi)[W] - W.^[W'^ - Wk)\ < 



3/2p(i+i+fc)/2(-x _p)3/2- 



n 



Applying Lemma B.l to the matrix nA with a = 2 and j = d — 1, we 
obtain 

(,) ^ 7i(2/(d-l) + l)'^-i ^ 15n 
(d-l) - d ' 

Combining all estimates with Theorem 2.1 proves Theorem 4.1. 
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